Lesson 4:
Fundamentals of Visual Analytics

Dr. Kam Tin Seong
Assoc. Professor of Information Systems (Practice)

School of Computing and Information Systems,
Singapore Management University

20 May 2025

What will you learn from this lesson?

  • Visual Analytics for Knowledge Discovery
  • Visual Analytics Approach for Statistical Testing
  • Visual Analytics for Building Better Models
  • Visualising Uncertainty
  • Variation and Its Discontents

Visually Analytics for Knowledge Discovery

Motivation: To combine data visualisation and statistical modeling.

Visual Statistical Testing

  • To provide alternative statistical inference methods by default.

Visual Statistical Testing

  • To follow best practices for statistical reporting.

  • For all statistical tests reported in the plots, the default template abides by the APA gold standard for statistical reporting. For example, here are results from a robust t-test:

Two-sample means

Boxplot revealing the mean and distribution of two samples.

Boxplot with two-sample mean test

Visually-driven Correlation Analysis

Scatter plot showing the relationship between two continuous variables.

Scatter plot with significant test of correlation.

Visually-driven Association (Independent) Analysis

Mosaic plot showing the association between two categorical variables.

Stacked bar chart with significant test of association.

Visual Analytics Approach for Building Exploratory Models

Model Diagnostic: checking for multicolinearity:

Conventional statistical report

Visual Analytics approach

Visual Analytics Approach for Building Exploratory Models

Model Diagnostic: Checking normality assumption

Model Diagnostic: Checking model for homogeneity of variances

Visual Analytics Appraoch for Building Exploratory Models

Analysing model parameters

Conventional statistical report

Visual Analytics approach

Visualising Uncertainty

Why it is important?

  • One of the most challenging aspects of data visualization is the visualization of uncertainty.

Source: Chart 61, LABOUR FORCE IN SINGAPORE 2019, pg. 52.

Why one shouldn’t use a bar graph, even if the data are normally distributed?

  • It is not appropriate to displace average values on bars.

Why Error bar failed?

  • Each error bar is constructed using a 95% confidence interval of the mean.

Error bar on a dot plot

  • Each error bar is constructed using a 95% confidence interval of the percentage.

Graphical methods for visualising uncertainty

Reference: Visualizing the uncertainty of point estimates

Back to Statistics 101

Population and samples

  • When drawing many samples from a population, it is possible to obtain a few with means that greatly differ from the population.

Reference: Cairo, A. (2016) The Truthful Art, Chapter 11, New Riders]

A reminder of the standard normal distribution

Reference: Cairo, A. (2016) The Truthful Art, Chapter 11, New Riders.

The standard error

The formulas of standard deviation and standard error

Reference: Cairo, A. (2016) The Truthful Art, Chapter 11, New Riders.

Calculating the confidence interval of a mean

Reference: Cairo, A. (2016) The Truthful Art, Chapter 11, New Riders.

Calculating the confidence interval of a percentage

Reference: Cairo, A. (2016) The Truthful Art, Chapter 11, New Riders.

2-d graphical methods for visualising uncertainty

Scatter plot with 95% confidence ellipse

Scatter plot with 95% confidence intervals

Confidence band of a trend line

Confidence band and fit lines

A graded confidence band

Variation and Its Discontents

Random and unfair comparisons

Reference: Variation and its discontents

Funnel plots to the rescue

Statistical details

Normality assumption

  • Before plotting the graph, it is important to check if the values are conformed to normal distribution assumption.

  • If the raw values are not conformed to normality assumption, they have to be transformed.